Improving Machine Translation Performance by Exploiting Non-Parallel Corpora
نویسندگان
چکیده
منابع مشابه
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora
We present a novel method for discovering parallel sentences in comparable, non-parallel corpora. We train a maximum entropy classifier that, given a pair of sentences, can reliably determine whether or not they are translations of each other. Using this approach, we extract parallel data from large Chinese, Arabic, and English non-parallel newspaper corpora. We evaluate the quality of the extr...
متن کاملImproving Named Entity Translation by Exploiting Comparable and Parallel Corpora
Translation of named entities (NEs), such as person, organization, country, and location names is very important for several natural language processing applications. It plays a vital role in applications like cross lingual information retrieval, and machine translation. Web and news documents introduce new named entities on regular basis. Those new names cannot be captured by ordinary machine ...
متن کاملImproving Machine Translation Performance Using Comparable Corpora
The overwhelming majority of the languages in the world are spoken by less than 50 million native speakers, and automatic translation of many of these languages is less investigated due to the lack of linguistic resources such as parallel corpora. In the ACCURAT project we will work on novel methods how comparable corpora can compensate for this shortage and improve machine translation systems ...
متن کاملExploiting Variant Corpora for Machine Translation
This paper proposes the usage of variant corpora, i.e., parallel text corpora that are equal in meaning but use different ways to express content, in order to improve corpus-based machine translation. The usage of multiple training corpora of the same content with different sources results in variant models that focus on specific linguistic phenomena covered by the respective corpus. The propos...
متن کاملImproving Statistical Machine Translation Using Comparable Corpora
Title of dissertation: Improving Statistical Machine Translation Using Comparable Corpora Matthew Garvey Snover, Doctor of Philosophy, 2010 Dissertation directed by: Professor Bonnie Dorr Department of Computer Science With thousands of languages in the world, and the increasing speed and quantity of information being distributed across the world, automatic translation between languages by comp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computational Linguistics
سال: 2005
ISSN: 0891-2017,1530-9312
DOI: 10.1162/089120105775299168